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A COGNITIVE CONTROL FRAMEWORK FOR AUTOMATIC CONTROL OF 
APPLICATION PROGRAMS EXPOSING A GRAPHICAL USER INTERFACE 

BACKGROUND 
FIELD 

5 The present invention relates generally to automatic control of software 

application programs and image analysis and, more specifically, to analyzing graphical 
user interface (GUI) images displayed by an application program for automatic control 
of subsequent execution of the application program. 

DESCRIPTION 

1 0 Typical application program analysis systems capture keyboard input data and 

mouse input data entered by a user. The captured input data may then be used to replay 
the application program. These systems rely on playback of the application program on 
the same computer system used to capture the input data, and thus are not portable. 

Some existing application program analysis systems use image recognition 

1 5 techniques that are dependent on screen resolution and/or drawing schemes, or have 
strong dependencies to the underlying operating system (OS) being used. Such systems 
typically rely on dependencies such as Windows32 or X-Windows application 
programming interfaces (APIs). This limits their portability and usefulness. 

Hence, better techniques for analyzing the GUIs of application programs are 

20 desired, 

BRIEF DESCRIPTION OF THE DRAWINGS 
The features and advantages of the present invention will become apparent from 
the following detailed description of the present invention in which: 

Figure 1 is a diagram of a cognitive control framework system according to an 
25 embodiment of the present invention; 

Figure 2 is a flow diagram illustrating processing in a cognitive control 
framework according to an embodiment of the present invention; 

Figure 3 is an example display of the GUI of an application program captured 
and saved during a recording phase; 
30 Figure 4 is an example display of the GUI of an application program captured 

during a playback phase; 

Figure 5 js an evampk m ige II i 
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operations of the recording phase according to an embodiment of the present invention; 

Figure 6 is an example image illustrating objects of activity of the recording 
phase according to an embodiment of the present invention; 

Figure 7 is an example image illustrating objects identified during contouring 
5 operations of the playback phase according to an embodiment of the present invention; 
and 

Figure 8 is an example image illustrating a hypothesis during the playback phase 
according to an embodiment of the present invention. 

DETAILED DESCRIPTION 

10 An embodiment of the present invention, is a cognitive control framework (CCF) 

for automatic control of software application programs that have a graphical user 
interface (GUI). Examples of such applications programs may be executed on current 
operating systems such as Microsoft Windows® and Linux, for example, as well as 
other operating systems. An embodiment of the present invention creates a system 

15 simulating a human user interacting with the GUI of the application program and using 
the GUI for automatic control of the application program without reiving on 
dependencies such as specific graphical libraries, windowing systems, or visual controls 
interfaces or implementations. The CCF comprises an easy-to-use cross-platform tool 
useful for GUI testing based on pattern recognition. By being independent of any OS- 

20 specific controls and graphical libraries, the CCF may be used for interaction with non- 
standard graphical interfaces as well as with well known ones. The system provides for 
recording any kind of keyboard and mouse actions the user performs while working 
with the GUI of the application program and then providing playback of the recorded 
scenario. In the present invention, image analysis of captured display data (such as 

25 screen shots, for example) is performed to identify actions of the application program 
corresponding to user input data. These actions and input data may be stored for use in 
future playback of the same user scenario for automatically interacting with the 
application program. 

Embodiments of the present invention comprise operating on two phases: a 

30 recording phase and a playback phase. During the recording phase, the system is 
"learning"' how to control the application program. The system registers and captures 
input actions supplied by the user (such as a mouse click or entering of text via a 
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keyboard, for example) and display data (e.g. screen shots) of images displayed by the 
application program in response to those actions. The user actions, the time interval 
between actions, resulting display data of the GUT of the application program, arid 
possibly other data and/or commands form an execution scenario. By following the 
5 execution scenario, during the playback phase the system provides the same but fully 
automatic execution of the application program (simulating the user control but without 
the real presence of the user). Automatic execution is made possible due to a plurality of 
image analysis and structural techniques applied correspondingly to images during the 
ree< ding and Playback phases. 

10 Figure 1 is a diagram of a cognitive control framework (CCF) system 100 

according to an embodiment of the present invention. Figure 1 shows two components, 
recording component 102 and playback component 104. These components may be 
implemented in software, firmware* or hardware, or a combination of software, 
firmware and hardware. In the recording component, the CCF system registers and 

1 5 captures user input activity at block 106. For example, the user may make input choices 
over time to an application program being executed by a computer system using a 
mouse, keyboard, or other input device. This input data is captured and stored by the 
CCF system. Next, at block 108, the display data may be captured (e.g. screen shots are 
taken). In one embodiment, the display data may captured only when user input has 

20 been received by the application program. The display data is also saved. At block 1 10, 
the data captured during blocks 106 and 108 may be analyzed and saved. These 
processes may be repeated a plurality of times. The result of She processing of the 
recording component comprises an execution scenario 112 for the application program 
being processed by the system. In one embodiment, the execution scenario comprises a 

25 script containing Extended Markup Language (XML) tags. The execution scenario 
describes a sequence of user inputs to the application program, corresponding display 
images on a GUI of the application program, and commands directing the application 
program to perform some actions. 

At a later point in time, during the playback phase the playback component 104 

30 may be initiated. At block 1 14, simulated user activity may be generated based on the 
execution scenario. That is, saved inputs and commands from the execution scenario 
may be input to the application program for purposes of automatic control using the 
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CCF system. While the application program processes this data, display data may be 
changed on the display as a result. At block 116, the CCF system performs image 
analysis on the playback display data currently being shown as a result of application 
program processing and the display data captured during the recording phase. At block 
5 118, recorded time conditions may be checked to take into account possible variations 
in playback. For example, the time when an object appears may be within a time 
interval based on a recorded time. For example, in one embodiment a lower bound time 
(time to start the search) may be extracted from the saved data in the execution scenario 
and an upper bound time may be the lower bound time pins 10%, or some other 

10 appropriate value. Processing of blocks 1 14, 116, and 118 each result in data being 
stored in report 120. At block i 19, the CCF system controls execution of the application 
program based on the results of the image analysis. Blocks 11 4, 116 and 118 may he 
repeated for each in a sequence of user input data items from the execution scenario. 

The time interval between sequential actions is a part of the captured execution 

1 5 scenario. However, while following the execution scenario in the playback phase, one 
should not expect that the time interval between any two actions at playback will be 
equal to the time interval between the same two actions during the recording phase. 
There are a number of objective reasons why this interval could be different on 
playback than during recording. For example, the application program during recording 

20 and playback may be executed on different computer systems having different processor 
speeds, or an application program could require different times for the same actions 
during playback due to accesses of external data or resources. This indicates a 
requirement in the CCF system to handle flexible time conditions, e.g. handle some 
tolerance for the time interval between actions during the playback phase. During that 

25 time interval at playback, the system checks the recorded display data to the playback 
display data several times to determine if the playback display data is substantially 
similar to the recorded display data. A finding that the two are substantially similar 
indicates that a previous user action has completed and the system can progress to the 
next action in the execution scenario. This activity may be similar to the situation where 

30 ih iser is interae lit *, tion } I pau <- illy t< 

display to determine if the expected visible changes to the display have been made by 
the application program based on previous actions. If so, then a new action may be 
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performed. If at the end of a higher bound of the time interval the application program 
has not produced on image on the display that the CCF system expected aecoidmg to 
the execution scenario, then the CCF system may interrupt the playback of the 
execution scenario and generate an error report describing how the execution scenario 
5 has not been followed- In one embodiment, the scenario may be corrected and the CCF 
system may be required to use other branches vo continue. 

Hie cognitive control framework (CCF) system of embodiments of the present 
invention performs image analysis and object detection processing on display data from 
the GUI of the application program. The CCF system includes comparing an image 

10 captured during a recording phase (called 1R) to the corresponding image captured 
during the playback phase (called IP). One task of the system is to detect an object in 
the 1R to which the user applied an action, find the corresponding object in the IP, and 
continue progress on the execution path of the execution scenario by applying the action 
to the detected object. These steps may be repeated for multiple objects within an 

1 5 image, and may be repeated across multiple pairs of IRs and IPs over time. An object 
that the user has applied an action to may be called an "object of action." Absence in the 
IP of the object of action corresponding to the one found at !R means that one should 
capture the IP again at a later time and try to find the object of action again. Finally, 
either an object of action may be found in the IP or execution of the scenario may be 

20 halted and a report generated describing how the wrong state was achieved and the 
scenario may not be continued. In embodiments of the present invention, this detection 
of objects of action may be done in real time during the playback phase, progressing 
from oiie action to another. Thus, the image analysis process employed must have good 
performance so as to introduce only a minimal disturbance to the time conditions at 

25 playback. 

The CCF system of embodiments of the present invention comprises an image 
analysts arid detecting process. Such a process has at least two requirements. First, the 
process should be able to overcome some variations in the captured images such as 
different color scheme, fonts, and the layout and state of the visual elements. In one 
30 embodiment, comparison constraints for checking these items (color scheme, fonts, etc) 
may be set to specified parameters in accordance with specific needs. Overcoming these 
variations is desirable because recording and playback might be executed in different 
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operating environments such as different screen resolutions, different visual schemes, 
different window layouts, and so on. Additionally, there could be insignificant 
differences in corresponding IR (usually captured after an action was applied to an 
object of interest) and IP pairs (captured after a previous action was completed). 
5 Second, the implementation of the image analysis and object detection process should 
be fast enough to introduce only minimal disturbances and delay of application 
execution during playback. 

By processing captured images, the system builds descriptions of the images in 
terms of the objects presented on them. Each display object may be represented by its 
30 contour and a plurality of properties. Table I enumerates some possible contour 
properties for use in the present invention. In other embodiments, other properties may 
also be used. 



Table I ( ont< u properties 


Property 


Description 


Location 


Coordinates (on the image) of the contour center. 


Image size 


Characteristic contour size. In case of rectangular contours they are just 
vertical and horizontal sizes. For controls of more complicated shape, 
another format may be used. 


Layout 


Connection to other c > >rox mit> to its boundaries/ 
layout pattern of tins contour. 


Content 
Type 


Indicates what is inside of the contour: text, image or a combination. 


Content 


If the content type is text, then a text string; if image (e.g. icon), then 
the image. 



Figure 2 is a flow diagram illustrating processing of a CCF system according to 
an embodiment of the present invention. During the recording phase 220 handled by 

1 5 recording component 3 02, at block 200 the system determines contours of objects in the 
IR. At block 202, the system detects a current object of activity. At block 204, the 
system detects additional objects adjacent to the current object of activity in the IR. 
These steps (200, 202, and 204) may be repeated over time for all objects of activity 
during execution of the application program in the recording phase. 

20 Next, during the playback phase 222 handled by playback component 104, at 
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block 206 the CCF system determines the contours of objects in the IP. At block 208, 
the CCF system filters contours by size to determine contours that may become 
hypotheses for active objects and contours that connect mem. At block 210, the CCF 
system filters the objects by basic space layout in the IP to determine subsets of 
5 hypotheses for active and additional objects. For example, filtering criteria for space 
layout may include tables, wizards, and menus. In one embodiment, the user (or CCF 
schema with a cascade search) could set both strict (e.g. "as is") and fitzzy (e.g. "object 
could be near each other") conditions. At block 212, the CCF system filters the objects 
by content to produce further subsets of hypotheses for active and additional objects. 

10 For example, the filtering criteria by content may include images and text. Moreover, in 
one embodiment, the user (or CCF schema with cascade search) could set both strict 
(e.g. "image should have difference in a few points and text should have minimal 
differences on a base of Levenstein distance") and fuzzy (e.g, 'image could be stable to 
highlighting and have insignificant structural changes and text could have noticeable 

15 differences on a base of Levenstein distance without consideration of digits") 
conditions, At block 214, the CCF system performs structural filtering of the objects to 
produce a best hypo: tive objects. 

Finally, at block 216, the CCF system recalculates old actions for a new object, 
by applying the action according to the execution scenario. For example, suppose the 

20 user selected (via the mouse) the screen location at (X-70, Y=200) ? and that a button is 
displayed at the rectangle denoted (Xl-50, Y 1-1 50, X2-100, Y2=100). In the IP, the 
button may be represented as a rectangle denoted (XI ==250, Y 1=300, X2-20Q, 
Y2=100). For a general view, coordinates of the top left corner and the size of the 
rectangle may be changed. The mouse click (user selection) may be recalculated based 

25 on the position of the button and the scaled size (for X and Y coordinates). The 
calculation gives the new mouse click coordinates (e.g.. X=290, Y~350). 

Table II shows the input data and output of the image a) i , ess for Figure 

2. 
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Table II. linage Analysis Processing 


Step 


Input Data 


Result 


Input parameters and 
Description 


Working with image from recording phase (IR> by recording component 


I . Contouring 


Image from 
recoding (IR) 


Contours 


Thresholds, distances between 
objects (with some tolerance). 
Intel® OpenCV library used in one 
embodiment. 


2. Detecting 
object of 
activity 


Image IR and 

contours 

from 

previous 

step. 


Contour 
representing 
object of 
activity 


Typical ob e - 1; iterance) 
for object of action. 
Optical character recognition 
(OCR) and fuzzy text comparison, 
e.g. with Levenshtein distance. 


3. Detecting 
additional 
objects 
around object 
of activity 


linage 1R. 
contours and 

active 
objects. 


Additional 
objects and 
their layout 
against object 
of action 


Typical object size (with tolerance) 
for additional objects. 
Structural analysis, e.g. "criss- 
cross" rules. 


Working with image from playback phase (IF) by playback component 


4. Contouring 


Image from 

playback 

(IP) 


Contours 


Thresholds, distances between 
objects (with some tolerance). 
Intel® OpenCV library used in one 
embodiment. 


5. Filtering by 
size 


Contours 
from 
previous 
step 


Contours that 
become 

hypotheses for 
active object 
and contours 
connected with 
them 


Mean object size (with tolerance) 
based on active object 
characteristics evaluated at Step 2. 
Typical object size (with tolerance) 
for additional objects. 
Filtering ont contours that don't fit 
into input size limits. 
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6, Filtering by 
basic space 
layout 


Subsets of 
hypotheses 
for active 
and 

additional 
objects 


subsets of 
hypotheses for 
active and 
additional 
objects 


Fuzzy distance filtration. Fuzzy 
filtration for directions. 


7, Filtering by 
content 


Subsets of* 
hypotheses 
for active 
and 

additional 
objects 


Decreased 
subsets of 
hypotheses for 
active and 
additional 
objects 


OCR and fuzzy text comparison, 
e.g. with Leve-nshtem distance. 
Fuzzy image comparison. Using 
"fuzzy content type" method for 
filtration. 


8. Structural 


Subsets of 
hypotheses 
for active 

additional 
objects 


The best 
hypothesis for 
active objects. 


Method based on fuzzy triple links 
both between objects from 1.R and 
their hypotheses from IF, It's stable 
to additional objects which don't 
have strong structural links with 
active object. Moreover, one can 
use tl^ result of this ni^liiod to 
choose the best hypotheses for 
active objects. Some other 
methods, e.g. Hough 
transformation may also be used 
here. 


9, 

Recalculating 
old actions for 
new object 


Object of 
action 


Applied the 
action 

according to 
the execution 
scenario 


Recalculating action coordinates in 
IP (playback image) coordinate 
system 



During filtering at each step there is an evaluation of specific contour properties 
(as required for a specific filter). This filtering pipeline is designed in such a way that 
the most time consuming evaluation steps are shifted to later in the processing pipeline 
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when the number of contours (hypotheses) is smaller. By using this approach, the 
overall computational cost may be decreased, thereby helping to ensure good 
performance of the system. 

It is useful to maintain a compromise in order to make sure that the system does 
5 not filter out some contours in the early steps that may be later determined to be either a 
hypothesis of an object of activity or objects connected with an object of activity. In this 
regard, predefined input parameters may be set to broad limits that requires spending a 
little more time on processing of additional contours (hypotheses), but ensure that the 
system has not dropped important contours. 
10 Example pseudo-code for one embodiment of the present invention is shown in 

Table HI. 

Table HI. Pseudo Code Example 



BEGIN CCF 
1 5 «««« Recording »»»» 

LOOP /""recording, e.g. till a special key combination */ 
Wait on user action /*mo«se, keyboard, it's possible to set something else*/ 
Hook and save screenshot /*e.g. <Screenshot fileName- M l.png'V>*/ 
Save time interval from the previous action /*e.g. <S!eep duraiion»''2000'7>*/ 
20 Save information about, user action 

/*e.g. <Mouse acdon~"RightCiick r! x-'100 ,; y="2<XT/>«/ 
END LOOP ^recording, e.g. till a special key combination*/ 
EXIT 

«««« Post-processing »»»> 
25 Process saved data into a more compact form. It's possible for the user to change it for 
his or her needs. 
«««« Playback »»» 
LOOP /*tUI the end of saved data*/ 
Load time interval and wait in accordance with it. 
30 IF [actions depend on coordinates on the screen] /*e.g. mouse click*/ THEN 
Load saved screensho t 
Detect object of action /*e.g. button*/, nearest structure-layout /*e.g. menu items 
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around button*/ and other useful info or saved screenshot 
TimeConditionsJabel: Hook the current screenshot 

Use image processing io find the corresponding object on the current screenshot /"It's 
possible to require more information from saved screenshot during search*/ 

5 

IF [Object not foundj THEN 
IF (Check time condition] /*e.g. it's possible to repeat search 3 times with 1000- 
msec step, for example*/ THEN 
GOTO TimeCond itions J a he! 
10 ELSE 

EXIT with error code /♦moreover, it's possible to send corresponding report to log- 
file*/ 

END IF 
15 ELSE 

Recalculate actions on a base of new found objects /*e.g. recalculate new 
coordinates for mouse click*/ 

END IF 
END IF 

20 

Produce actions /*it could be changed actions after image processing; moreover, it's 
possible to finish execution in case of wrong situations during, actions*/ 
END LOOP /*till the end of saved data*/ 
EXIT 
25 END CCF 



Embodiments of the present invention including n e malysis <md object of 
activity detection on two images may be illustrated by the following examples using a 
performance analyzer application program. These figures show applying the process 
30 blocks of figure 2 to a first image from the recording phase (1R) and a corresponding 
image from the playback phase (IP). Figure 3 is an example display of the GUI of an 
application program captured and saved during a recording phase. This IR screen shot 
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shows feat the item "Tuning Activity" was selected by the user using a mouse. Figure 4 
is an example display of the GUI of an application program captured during a playback 
phase. Note there are some insignificant changes in the displayed windows in 
comparison to figure 3. Figure 5 is an example image illustrating objects identified 
5 during contouring operations of the recording phase according to an embodiment of the 
present invention as performed on the image of Figure 3. Figure 5 shows the sample 
output from block 200 of Figure 2. Figure 6 is an example image illustrating objects of 
activity of the recording phase according to an embodiment of the present invention as 
performed on the image of Figure 5. These contours were identified after performing 

10 blocks 202 and 204 of Figure 2 on the image from Figure 5, The contour with the tex t 
labeled "Tuning" has been determined in this example to be the current object of 
activity. Figure 7 is an example image illustrating objects identified during contouring 
operations of the playback phase according to an embodiment of the present invention. 
This image is output from performing block 206 of Figure 2 on the sample image of 

15 Figure 4. Finally, Figure 8 is an example image illustrating a hypothesis during the 
playback phase according to an embodiment of the present invention. Figure 8 shows 
hypotheses from Figure 7 for the "Tuning Activity" object of activity from Figure 6. 
Size, space, content, and structural filtration of blocks 206-214 has been performed. The 
ellipse represents the contour which was selected as die best hypothesis from 

20 performing block 216 of Figure 2. A new point for the mouse click is recalculated 
relative to the given object (i.e., the "tuning" display object). 

Embodiments of the present invention provide at least several advantages. One 
benefit is that the present approach is applicable to any application program exposing a 
GUI on any platform and OS, and is not dependent on a specific application 

25 programming interface (API) or architecture of visual system implementation and OS 
specifics. In contrast, existing prior art systems are dependent on system APIs while 
working wife visual elements. Another benefit is that the present approach uses a 
siruetitred rep , do t oi i ges and objects so tt makes the execution stable with 
respect to visual properties feat could change (such as screen resolution, drawing 

30 schemes and changes in data values, shapes, and orders). This allows one execution 
scenario (set to a specific image resolution, visual scheme, OS, and computer system) to 
be run on many different computer systems that could have different visual 
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configurations and operating systems. Existing systems that claim to use image 
recognition techniques are dependent on the screen resolution and/or drawing schemes. 
Also, such systems have strong dependencies on the OS resident in those systems. 

Another benefit is that the present system includes paramerization that supports 
5 flexibility and extensibility. A number of conditions and specific processes used may be 
easily changed to allow the CCF system to work for objects represented by contours of 
different shapes, more strict or flexible time conditions, or more strict or flexible object 
sizes and different layout requirements. Embodiments of the present invention allow the 
system to adapt to specific application program family and use cases. For example, 
10 while using this system as a test framework, on can ensure that each action completed 
within the required time limits and easily detect performance degradation. Thus, 
embodiments of the present invention may be used for capturing and simulating a user 
interacting with an application program, and thus may be useful for developing new 
GUIs. 

1 5 Reference in the specification to "one embodiment" or "an embodiment" of the 

present invention means that a particular feature, structure or characteristic described in 
connection with the embodiment is included in at least one embodiment of the present 
invention. Thus, the appearances of the phrase "in one embodiment" appearing in 
various places throughout the specification are not necessarily all referring to the same 

20 embodiment. 

Although the operations detailed herein may be described as a sequential 
process, some of the operations may in fact be performed in parallel or concurrently. In 
addition, in some embodiments the order of the operations may be rearranged without 
departing from the scope of the invention. 

25 The techniques described herein are not limited to any particular hardware or 

software configuration; they may find applicability in any computing or processing 
environment. The techniques may be implemented in hardware, software, or a 
combination of the two. The techniques may be implemented in programs executing on 
programmable machines such as mobile or stationary computers, personal digital 

30 assistants, set top boxes, cellular telephones and pagers, and other electronic devices, 
that each include a processor, a storage medium readable by the processor (including 
volatile and non-volatile memory and/or storage elements), at least one input device, 
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and one or more output devices. Program code is applied to the data entered using the 
input device to perform the functions described and to generate output information. The 
output information may be applied to one or more output devices. One of ordinary skill 
in the art may appreciate that the invention can be practiced with various computer 
5 system configurations, including multiprocessor systems, minicomputers, mainframe 
computers, and the like. The invention can also be practiced in distributed computing 
environments where tasks may be performed by remote processing devices that are 
linked through a communications network. 

Each program may be implemented in a high level procedural or object oriented 

10 programming language to communicate with a processing system. However, programs 
may be implemented in assembly or machine language, if desired. In any case, the 
language may be compiled or interpreted. 

Program instructions may be used to cause a general-purpose or special-purpose 
processing system that is programmed with the instructions to perform the operations 

15 described herein. Alternatively, the. operations may be performed by specific hardware 
components that contain hardwired logic for performing the operations, or by any 
combination of programmed computer components and custom hardware components. 
The methods described herein may be provided as a computer program product that 
may include a machine accessible medium having stored thereon instructions that may 

20 be used to program a processing system or other electronic device to perform the 
methods. The term "machine accessible medium" used herein shall include any medium 
that is capable of storing or encoding a sequence of instructions for execution by a 
machine and that cause the machine to perform any one of the methods described 
herein. The term "machine accessible medium" shall accordingly include, but not be 

25 limited to, solid-state memories, and optical and magnetic disks. Furthermore, it is 
common in the art to speak of software, in one form or another (e.g., program, 
procedure, process, application, module, logic, and so on) as taking n iction o causing 
a result. Such expressions are merely a shorthand way of stating the execution of the 
software by a processing system carjse the processor to perform an action of produce a 

30 result. 



WO 2006/132564 



PCT/RolOO5/0«<B25 



15 

CLAIMS 

What is claimed is: 

1. A method of automatically controlling execution of an application program 
having a graphical user interface comprising: 

5 capturing user input data and images displayed by the graphical user interface 

during a record ing phase of execution of the application program: 

analyzing the captured user input data and displayed images to generate an 
execution seen vrio during the recording phase; 

generating simulated user input data based on the execution scenario during a 
1 0 playback phase of execution of the application program and inputting the simulated user 
input data to the application program; 

performing image analysis on images displayed by the graphical user interface 
as a result of processing the simulated user input data during the playback phase and 
captured displayed images from the recording phase- and 
1 5 automatically controlling execution of the application program based at least in 

part on the image analysis. 

2. The method of claim 1, wherein performing image analysis compri ses 
comparing the displayed images captured during the recording phase with displayed 
images from the playback phase. 

20 3 . The method of claim 1 , wherein the user input data compri ses at least one of 

keyboard input data and mouse input data. 

4. The method of claim 1 , wherein analyzing the captured user input data and 

displayed images in the recording phase comprises identifying actions of the application 

program corresponding to the captured user input data. 
25 5. The method of claim 1 , wherein the execution scenario comprises a script 

including extended markup language (XML) tags. 

6, The method of claim 1. wherein analyzing the captured user input data and 
displayed images during a recording phase comprises: 

determining contours of objects shown in a displayed image; 
30 detecting an object of activity from among the objects; and 

detecting additional objects located adjacent to the object of activity, 

7, The method of claim L wherein performing image analysis comprises 
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determining contours of objects in an image displayed by the graphical user interface 
during the playback phase and filtering the objects according to size to produce a set of 
hypotheses for active objects. 

8. The method of claim 7, wherein performing <■-.. ■ ■ ter comprises 
filtering objects according to space layout to produce a first subset of hypotheses for 
active objects. 

9. The method of claim 7„ wherein performing image analysis further comprises 
filtering objects by content to produce a second subset of hypotheses for active objects. 

10. The method of claim 7, wherein performing image analysis further 
comprises structural filtering of objects to produce a best hypothesis for an active 
object 

1 1. The method of claim 7, wherein performing image analysis further 
comprises recalculating old actions for a new object identified as the active object. 

12. An article comprising: a machine readable medium containing instructions, 
which when executed, result in automatically controlling execution of an application 
program having a graphical user interface by 

capturing user input data and images displayed by the graphical user interface 
during a recording phase of execution of the application program; 

analyzing the captured user input data and displayed images to generate an 
execution scenario during the recording phase; 

generating simulated user input data based on the execution scenario during a 
playback phase of execution of the application program and inputting the simulated user 
input data to the application program; 

performing image analysis on image displayed by th 1 i iser interface 
as a result of processing the simulated user input data during the playback phase and 
captured displayed images from the recording phase; and 

automatically controlling execution of the application program based at least in 
part on the image analysis: 

13. The article of claim 12, wherein instructions for performing image analysis 
comprise instructions for comparing the displayed images captured during the recording 
phase with display ed naj > j torn the playback phase. 

14. The article of claim 12, wherein the user input data comprises at least one of 
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keyboard input data and mouse input data, 

1 5. The article of claim 12, wherein instructions for analyzing the captured user 
input data and displayed images in the recording phase comprise instructions for 
identifying actions of the application program corresponding to the captured user input 

5 data. 

16. The article of claim 12, wherein the execution scenario comprises a script 
including extended markup language (XML) tags. 

17. The article of claim 12, wherein instructions for analyzing the captured user 
input data and displayed images during a recording phase comprise instructions for: 

1 0 determining contours of objects shown in a displayed image; 

detecting an object of activity from among the objects; and 
detecting additional objects located adjacent to the object of activity. 

18. The article of claim 12, wherein instructions for performing image analysis 
comprise instructions for determining contours of objects in an image displayed by the 

1 5 graphical user interface during the playback phase and filtering the objects according to 
size to produce a set of hypotheses for active objects. 

19. The article of claim 1 8, wherein instructions for performing image analysis 
further comprise instructions for filtering objects according to space layout to produce a 
first subset of hypotheses for active objects. 

20 20. The method of claim 1 8, wherein instructions for performing image analysis 

further comprise instructions for filtering objects by content to produce a second subset 
of hypotheses for active objects. 

21. The article of claim 18. wherein instructions for performing image analysis 
further comprise instructions for structural filtering of objects to produce a best 

25 hypothesis tor an active object. 

22. The article of claim 3 8, wherein instructions for performing image analysis 
farther comprise instructions for recalculating old actions for a new object identified as 
the active object. 

23. A cognitive control framework system for automatically controlling 
30 execution of an application program having a graphical user interface comprising: 

a recording component adapted to capture user input data and images displayed 
by the graphical user interface during a recording phase of execution of the application 
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program, and to analyze the captured user inptrt data and displayed images to generate 
an execution scenario during the recording phase, 

• a playback component adapted to generate simulated user input data based on 
the execution scenario during a playback phase of execution of the application program, 
5 to input the simulated user input data to (he application program, to perform image 
analysis on images displayed by the graphical user interface as a result of processing the 
simulated user input data during the playback phase and captured displayed images 
from the recording phase; and to automatically control execution of the application 
program based at least in part on the image analysis, 
10 24. The system of claim 23, wherein the playback component is adapted to 

compare the displayed images captured during the recording phase with displayed 
images from the playback phase. 

25, The system of claim 23, wherein the recording component is adapted to 
identify actions of the application program corresponding to the captured user input 

15 data. 

26. The system of claim 23, wherein the recording component is adapted to 
determine contours of objects shown in a displayed image, detect an object of activity 
from among the objects, and detect additional objects located adjacent to the object of 
activity. 

20 27. The system of claim 23, wherein the playback component is adapted to 

determine contours of objects in an image displayed by the graphical user interface 
during the playback phase and filter the objects according to size io produce a set of 
hypotheses for active objects. 

28, The system of claim 27, wherein the playback component is adapted to filter 

25 objects according to space i t to produce a first subset of hypotheses for active 

objects. 

29 The system of c Ian '7 i rein the playback compon< nt i -■ adapted to filter 
objects by content to produce a second subset of hypotheses for active objects. 

30. The >> ten « f claim 27 A the >la v.ck component is adapted to 
30 structurally filter objects to produce a best hypothesis tor an active object. 
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